Primer & vector removal/cleaning. Trim/cut the vector/primer from the contig. Sequence recognition for automatic vector removal

Sequence recognition
Removing vector/primer sequences

When is DNA Sequence Assembler removing vector sequences?

DNA Baser Assembler is removing contaminant vector sequences in each of the following cases:

from single sequences when they are opened in the editing window
from contigs, at the end of the assembly process (automatically)

How is DNA Sequence Assembler removing vector sequences?

DNA Baser Assembler is removing contaminant vector sequences using an algorithm based on recognition sequences. Whenever a recognition sequence is found (complete match) in the contig / single sequences, the vector sequence will be cleaned.

The recognition sequences can be one of the following:
- short vector sequences flanking the insert
- primers used for obtaining the PCR amplicons before cloning (not the vector primers used for sequencing). For details, see graph bellow.

When using as recognition sequences the vector sequences flanking the insert, than the removed fragments should include also the recognition sequences. To do this you need to check the option.

vector recognition sequence based on flanking vector regions

When using as recognition sequences the primers, than you have the possibility to keep the recognition sequences in the resulting sequence. To do this you need to check the option.

The recognition sequences need to meet the following criteria:
- when looking from a 5'-->3' direction, they are in front of the insert
- in the database they are introduced in the 5' --> 3' direction

sense of recognition sequences

Thus, if the recognition sequences are short vector sequences flanking the insert, than:
- one of the recognition sequences will match the sense strand, direction 5' --> 3'
- the second recognition sequence will match the antisense strand, direction 5' --> 3'

They are automatically reversed complemented when searching for them, so the direction of the contig is not of relevance for the results of the search.

IMPORTANT

The recognition sequences must be different else the program will cut too much from your contig. Example:

OK
recognition sequence 1: ACGAATTCGCCCTT
recognition sequence 2: GCGAATTCGCCCTT

BAD
recognition sequence 1: CGAATTCGCCCTT
recognition sequence 2: CGAATTCGCCCTT

The recognition sequences must to be different also when one of them is reverse complemented:

OK
recognition sequence 1: ACGAATTCGCCCTT
recognition sequence 2: GCGAATTCGCCCTT

BAD
recognition sequence 1: CGAATTCGCCCTT
recognition sequence 2: AAGGGCGAATTCG (if reverse complemented, than --> CGAATTCGCCCTT)

Using DNA Baser for removing vector sequences

The Vector cleaning window can be accessed under View -> Vector cleaning menu as seen in the image below. The keyboard shortcut is Control+P.

sequence assembly software

| Click a button to see its description |

Recognition sequence database

Insert here all your recognition sequences to have them at hand. DNA Baser will use a recognition sequence when it will process the contig, only if it is active (checked). The recognition sequences will be automatically reverse-complemented. Ambiguity code is also supported.
By default, the database contains a few recognition sequences for exemplification. To add your own recognition sequences, see the 'Add new recognition sequences section.

Add new recognition sequences

To add a new recognition sequences to your database, first you must fill the Name and Recognition sequence boxes. Afterwards, press the Add button.
You must insert the recognition sequences in 5'-3' direction. Degenerated recognition sequences (containing ambiguous bases) are recognized.

Delete

Delete recognition sequences selected from database. Shortcut key: press DELETE key when you are focused on the list.

Active recognition sequences. Activating a recognition sequence.

DNA Baser will use a recognition sequence only if it is active. In order to activate a recognition sequence you must check the checkbox in front of it.

Edit an existing recognition sequence

To edit an existing recognition sequence, just double click it. You will see that the recognition sequence will disappear from the database and will appear in the Add recognition sequence for vector cleaning box. Now edit its name or its sequence as you wish. Then press the Add button to re-add the primer to your database. When you are ready press the OK button to save your database, close this window and return to DNA Baser main interface.

OK button

It saves the database and close this window. The database will be saved in 'c:\DNA Baser\system' folder, as 'Primers.txt.pDB'. In addition, a backup copy will be saved in the same folder. (We are assuming that you have installed DNA Baser in ''c:\DNA Baser')
If you quit Baser without pressing this button, then all changes you made in the database will be lost.
All changes to the database will be taken in consideration only after you save the database (press the 'OK' button).

Cancel button

Close this window without saving changes to database.

How to use the vector removal tool:

The recognition sequences are used to remove the vector sequences in 2 cases:
- from single sequences when they are opened in the editing window
- from contigs, at the end of the assembly process

We are providing sample recognition sequences just for demonstration purposes. Very probably, you are not going to use them. They can be removed from the database using the delete button.

The first step is to add the recognition sequences you want to use to the database. Then activate the recognition sequences by checking the checkbox in front of them. More than one recognition sequences can be active in the same time. Close this window by pressing the OK button.
From now on, every time you assemble sequences into a new contig or open a sequence in the EDIT window, DNA Baser will cut the vector sequences (see graph bellow) from the contig / single sequence, provided that a match with the recognition sequence was found. If there was a sequencing error in the recognized sequence, than, immediately after correction, this will be recognized and highlighted in blue. You can keep or cut the recognition sequence:

The vector sequences are removed from the contig / single sequence ONLY when this is saved on your hard disk. You will still be able to see the vector sequences on the screen, in the Assembly window, while the recognition sequences are highlighted in blue in the contig / single sequence.

vector recognition sequence highlighted for vector cleaning

How to disable rector removal?

If you do not want to use recognition sequences to remove contaminant vector sequences, then make sure that none of the recognition sequences listed in your database are active (checked).

Why the vector is not cut in some contigs?

If the vector contains an IUPAC base it will work (the vector will be detected and cut) because DNA Baser Assembler is able to extrapolate the vectors.
However, if the CONTIG contains an IUPAC base in the vector region, the vector won't be detected because there is no match between contig and the vector.
That IUPAC base breaks the match.

Last update on Aug. 2009

Automatic vector cleaning

To remove the recognition sequence at batch from multiple sequences set use the Batch sequence processing tool. All sequences passed through this tool will have the recognition sequence (primer/vector) automatically removed.

Designing recognition sequences

This complete tutorial will show you how to design recognition sequences for the pGEM-T Eazy vector and then use the designed recognition sequences in DNA Baser.

SciVance Technologies

Support Online Manual